1. IntroductionWhen we run the machine learning program, especially when adjusting the network parameters, there are usually many parameters to be adjusted, the combination of parameters is more complicated. In accordance with the principle of attention > Time > Money, manual adjustment of attention costs by manpower is too high and is not worth it. The For loop or for loop-like approach is constrained by too-distinct levels, concise and flexible, with high attention costs and error-prone. This
The first tool to be introduced is the Sklearn model selection API (GRIDSEARCHCV)
Website Link: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html section I: Usage of GRIDSEARCHCV function
Sklearn.grid_search. GRIDSEARCHCV (
estimator, # is the model you want to train booster
Param_grid, # The params of the dictionary type n
-parametric optimization technique.
In Scikit-learn, the technology is provided by the GRIDSEARCHCV class.
When constructing the class, you must provide a hyper-parameter dictionary to evaluate the Param_grid parameter. This is a schematic diagram of the model parameter name and a large number of column values.
By default, precision is the core of optimization, but other cores can specify the score parameter for G
strategies, outlined below. Generic approaches to sampling search candidates is provided in scikit-learn:for given values, GRIDSEARCHCV Exhaustively considers all parameter combinations, while RANDOMIZEDSEARCHCV can sample a given number of Candida TES from a parameter space with a specified distribution. After describing these tools we detail best practice applicable to both approaches.3.2.1. Exhaustive Grid SearchThe grid search provided by
With LIGHTGBM and Xgboost respectively made the kaggle digit recognizer, try to use GRIDSEARCHCV tune the next parameter, mainly to Max_depth, Learning_rate, N_ Estimates and other parameters to debug, finally in 0.9747.
Capacity is limited, and next we don't know how to further adjust the parameters.
In addition, the Xgboost GRIDSEARCHCV will not be used, if there is a great God will, please inform.
Paste
use of grid search, that is, the GRIDSEARCHCV class. Of course, you can use the Cross_val_score class to adjust the parameters, but personally feel that there is no GRIDSEARCHCV convenience. In this paper, we only discuss the parameters of the RBF kernel using GRIDSEARCHCV for SVM.The parameters we will pay attention to when we use the
error of cross-validation is obtained.
When we provide a series of alpha values, we can use the GRIDSEARCHCV function to automatically find the optimal alpha value:
From Sklearn.grid_search import GRIDSEARCHCV
GSCV = GRIDSEARCHCV (Model (), Dict (Alpha=alphas), cv=3). Fit (X, y)
Scikit-learn also provides an inline CV model, such as
From Sklearn.linear_m
uses a brute force search of the different parameter lists we specify, and calculates the effect of estimating each combination on the performance of the model to obtain the optimal combination of parameters. fromsklearn.grid_search Import GRIDSEARCHCV fromSKLEARN.SVM Import Svcpipe_svc=pipeline ([('SCL', Standardscaler ()), ('CLF', SVC (random_state=1) ]) Param_range=[0.0001,0.001,0.01,0.1,1,Ten, -, +]param_grid=[{'Clf__c':p Aram_range,'Clf__kernel'
=Param_dist,N_iter=N_iter_search)Start=Time()Random_search.Fit(X,Y)Print("RANDOMIZEDSEARCHCV took%.2fSeconds for%dCandidates ""Parameter settings."%((Time()-Start),N_iter_search))Report(Random_search.Grid_scores_)# Use a full grid over all parametersParam_grid={"Max_depth":[3,None],"Max_features":[1,3,10],"Min_samples_split":[1,3,10],"Min_samples_leaf":[1,3,10],"Bootstrap":[True,False],"Criterion":["Gini","Entropy"]}# Run Grid SearchGrid_search=GRIDSEARCHCV
This blog content is in the last blog scikit feature selection, Xgboost regression prediction, model optimization on the basis of the actual combat optimization, so before reading this blog, please go to see the previous article.
The work I did earlier was basically about feature selection, and I wanted to write about some of the little experiences with xgboost parameter adjustments. I have also seen a lot of relevant content on the site before, basically translation from an English blog, but al
of a well-trained model,This allows you to know which variables need to be preserved and which can be discarded.
The following two classes need to be introduced:
From Xgboost import plot_importance from
matplotlib import Pyplot
Compared to the previous code, it is the importance of adding two lines after fit to draw a feature
Model.fit (X, y)
plot_importance (model)
pyplot.show ()
4. Parameter adjustment
How to adjust the parameters, the following is the three parameters of the general prac
from matplotlib.colors import Normalize to SKLEARN.SVM import SVC from Sklearn.preprocessin G Import Standardscaler from sklearn.datasets import load_iris from sklearn.model_selection Import stratifiedshufflesplit# layered shuffle split cross validation from sklearn.model_selection import GRIDSEARCHCV # Utility function to move the Midpoi
NT of a colormap to be around # the values of interest. Class Midpointnormalize (Normalize): Def __init__ (self,
overall model is not suitable. In the following case analysis, the overall model performance we will talk about refers to the average accuracy. Please be careful.2.3.1 Random Forest parameter adjustment case: Digit Recognizer
Here, we select Digit Recognizer in the 101 teaching competition on Kaggle as the case to demonstrate the process of adjusting parameters for RandomForestClassifier. Of course, we should not set different parameters manually and then train the model. Using the
to reduce features.General preference Use the following method: Enhanced regularization (this is said to reduce the value of C in Linearsvc)Regularization is the most effective way to reduce overfitting
Plot_learning_curve (Linearsvc (c=0.1), ' Linearsvc (c=0.1) ', x,y,ylim= (0.8,1), Train_sizes=np.linspace (. 05,0.2,5))
Adjust the regularization coefficient, found that there is a certain degree of relief, but still what problem, our coefficients are self-finalized, there is no way to automati
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.